real-world persona
RAGs to Riches: RAG-like Few-shot Learning for Large Language Model Role-playing
Rupprecht, Timothy, Nan, Enfu, Akbari, Arash, Akbari, Arman, Lu, Lei, Maan, Priyanka, Duffy, Sean, Zhao, Pu, He, Yumei, Kaeli, David, Wang, Yanzhi
Role-playing Large language models (LLMs) are increasingly deployed in high-stakes domains such as healthcare, education, and governance, where failures can directly impact user trust and well-being. A cost effective paradigm for LLM role-playing is few-shot learning, but existing approaches often cause models to break character in unexpected and potentially harmful ways, especially when interacting with hostile users. Inspired by Retrieval-Augmented Generation (RAG), we reformulate LLM role-playing into a text retrieval problem and propose a new prompting framework called RAGs-to-Riches, which leverages curated reference demonstrations to condition LLM responses. We evaluate our framework with LLM-as-a-judge preference voting and introduce two novel token-level ROUGE metrics: Intersection over Output (IOO) to quantity how much an LLM improvises and Intersection over References (IOR) to measure few-shot demonstrations utilization rate during the evaluation tasks. When simulating interactions with a hostile user, our prompting strategy incorporates in its responses during inference an average of 35% more tokens from the reference demonstrations. As a result, across 453 role-playing interactions, our models are consistently judged as being more authentic, and remain in-character more often than zero-shot and in-context Learning (ICL) methods. Our method presents a scalable strategy for building robust, human-aligned LLM role-playing frameworks.
- North America > Canada (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > New York (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Digital Avatars: Framework Development and Their Evaluation
Rupprecht, Timothy, Chang, Sung-En, Wu, Yushu, Lu, Lei, Nan, Enfu, Li, Chih-hsiang, Lai, Caiyue, Li, Zhimin, Hu, Zhijun, He, Yumei, Kaeli, David, Wang, Yanzhi
We present a novel prompting strategy for artificial intelligence driven digital avatars. To better quantify how our prompting strategy affects anthropomorphic features like humor, authenticity, and favorability we present Crowd Vote - an adaptation of Crowd Score that allows for judges to elect a large language model (LLM) candidate over competitors answering the same or similar prompts. To visualize the responses of our LLM, and the effectiveness of our prompting strategy we propose an end-to-end framework for creating high-fidelity artificial intelligence (AI) driven digital avatars. This pipeline effectively captures an individual's essence for interaction and our streaming algorithm delivers a high-quality digital avatar with real-time audio-video streaming from server to mobile device. Both our visualization tool, and our Crowd Vote metrics demonstrate our AI driven digital avatars have state-of-the-art humor, authenticity, and favorability outperforming all competitors and baselines. In the case of our Donald Trump and Joe Biden avatars, their authenticity and favorability are rated higher than even their real-world equivalents.
- North America > United States (0.89)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)